Learning Information Status of Discourse Entities
نویسنده
چکیده
In this paper we address the issue of automatically assigning information status to discourse entities. Using an annotated corpus of conversational English and exploiting morpho-syntactic and lexical features, we train a decision tree to classify entities introduced by noun phrases as old, mediated, or new. We compare its performance with hand-crafted rules that are mainly based on morpho-syntactic features and closely relate to the guidelines that had been used for the manual annotation. The decision tree model achieves an overall accuracy of 79.5%, significantly outperforming the hand-crafted algorithm (64.4%). We also experiment with binary classifications by collapsing in turn two of the three target classes into one and retraining the model. The highest accuracy achieved on binary classification is 93.1%.
منابع مشابه
Learning the Information Status of Noun Phrases in Spoken Dialogues
An entity in a dialogue may be old, new, or mediated/inferrable with respect to the hearer’s beliefs. Knowing the information status of the entities participating in a dialogue can therefore facilitate its interpretation. We address the under-investigated problem of automatically determining the information status of discourse entities. Specifically, we extend Nissim’s (2006) machine learning a...
متن کاملLearning the Fine-Grained Information Status of Discourse Entities
While information status (IS) plays a crucial role in discourse processing, there have only been a handful of attempts to automatically determine the IS of discourse entities. We examine a related but more challenging task, fine-grained IS determination, which involves classifying a discourse entity as one of 16 IS subtypes. We investigate the use of rich knowledge sources for this task in comb...
متن کاملThe Effects of Discourse Cues on Garden Path Processing
We report a self-paced reading study that investigated gardenpath sentences like While the boy washed {a/the} dog barked loudly and While the man hunted {a/the} deer ran into the woods. In such sentences, the critical noun phrase (dog, deer) tends to be misparsed as an object of the preceding verb, and has to be re-analyzed as a subject of the following clause when the disambiguating verb (e.g....
متن کاملThe Computation of the Informational Status of Discourse Entities
During language production, processes of information structuring constitute a relevant part. These processes are regarded as a mapping from a conceptual structure to a perspective semantic structure. I will focus on one aspect of i n f o r m a t i o n s t ruc tu r ing , n a m e l y the ve rba l i za t i on o f the cu r r en t menta l representation of entities. For this verbalization, the infor...
متن کاملA Framework For Annotating Information Structure In Discourse
We present a framework for the integrated analysis of the textual and prosodic characteristics of information structure in the Switchboard corpus of conversational English. Information structure describes the availability, organisation and salience of entities in a discourse model. We present standards for the annotation of information status (old, mediated and new), and give guidelines for ann...
متن کامل